Cost Weighting for Neural Machine Translation Domain Adaptation

نویسندگان

  • Boxing Chen
  • Colin Cherry
  • George F. Foster
  • Samuel Larkin
چکیده

In this paper, we propose a new domain adaptation technique for neural machine translation called cost weighting, which is appropriate for adaptation scenarios in which a small in-domain data set and a large general-domain data set are available. Cost weighting incorporates a domain classifier into the neural machine translation training algorithm, using features derived from the encoder representation in order to distinguish in-domain from out-of-domain data. Classifier probabilities are used to weight sentences according to their domain similarity when updating the parameters of the neural translation model. We compare cost weighting to two traditional domain adaptation techniques developed for statistical machine translation: data selection and sub-corpus weighting. Experiments on two large-data tasks show that both the traditional techniques and our novel proposal lead to significant gains, with cost weighting outperforming the traditional methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Instance Weighting for Neural Machine Translation Domain Adaptation

Instance weighting has been widely applied to phrase-based machine translation domain adaptation. However, it is challenging to be applied to Neural Machine Translation (NMT) directly, because NMT is not a linear model. In this paper, two instance weighting technologies, i.e., sentence weighting and domain weighting with a dynamic weight learning strategy, are proposed for NMT domain adaptation...

متن کامل

Translation Model Based Weighting for Phrase Extraction

Domain adaptation for statistical machine translation is the task of altering general models to improve performance on the test domain. In this work, we suggest several novel weighting schemes based on translation models for adapted phrase extraction. To calculate the weights, we first phrase align the general bilingual training data, then, using domain specific translation models, the aligned ...

متن کامل

Resampling Approach for Instance-based Domain Adaptation from Patent Domain to Newspaper Domain in Statistical Machine Translation

In this paper, we investigate a resampling approach for domain adaptation from a resource-rich domain (patent domain) to a resource-scarce target domain (newspaper domain) in Statistical Machine Translation (SMT). We propose two resampling methods for domain adaptation in SMT: random resampling and resampling for instance weighting. The random resampling randomly adds sentence pairs from the re...

متن کامل

Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation

We describe a new approach to SMT adaptation that weights out-of-domain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus component...

متن کامل

Domain specialization: a post-training domain adaptation for Neural Machine Translation

Domain adaptation is a key feature in Machine Translation. It generally encompasses terminology, domain and style adaptation, especially for human postediting workflows in Computer Assisted Translation (CAT). With Neural Machine Translation (NMT), we introduce a new notion of domain adaptation that we call “specialization” and which is showing promising results both in the learning speed and in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017